Title

Column

Title

Predicting Alpha Thalassemia Phenotype using Clinical Measures

Ashley Williams

Column

Abstract

Alpha thalassemia is a prevalent genetic disorder with a wide spectrum of clinical severity, ranging from asymptomatic carrier states to presenting fatal severe anemia. Genotype-phenotype correlations are generally strong but variable, necessitating reliable prognostic tools for clinical management and improved disease prevention. This project aimed to develop and validate a logistic regression model to predict the clinical phenotype (either silent carrier or alpha trait status) of alpha thalassemia patients using clinical variables.

Results indicate that a combination of hemoglobin concentration, red blood cell count, and lymphocyte percentage are strong predictors of phenotypic severity. The final model achieved predictive accuracy and allows for easy interpretation, demonstrating its potential use as a clinical decision support tool. The resulting model provides a cost-effective method to aid in alpha thalassemia assessment and prevention with particular application in low-resource settings where advanced testing is inaccessible.

Background

Column

Background

Alpha thalassemia is an inherited blood disorder causing the body to produce an insufficient amount of hemoglobin, thus leading to anemia

  • Alpha thalassemia occurs when 1 or more of the 4 total alpha-globin genes (2 inherited from each parent), which contribute to the synthesis of hemoglobin molecules, are mutated or deleted.
  • There are multiple types of alpha thalassemia with a range of severities. In this project, I focus on the following:
    • Alpha thalassemia silent carrier: One alpha-globin gene is affected, the other 3 are wildtype. Blood tests are often normal, but their red blood cells may be smaller than normal. Being a silent carrier means you don’t have signs of the disease, but you can pass the damaged gene on to progeny. This is confirmed by DNA tests.
    • Alpha thalassemia trait carrier: Two genes are affected. Patient likely to have mild anemia.
  • Having 3 affected genes leads to Hemoglobin H disease, where the patient has moderate to severe anemia. Having all 4 affected genes causes severe anemia, where most cases lead to prenatal death.
  • There is no cure for Alpha thalassemia. Thus, effective screening to detect Thalassemia carriers is vital to prevention. There are many challenges to an effective screening program, especially in low-resource settings. Considering alpha-thalassemia, genetic testing is needed for a confirmatory diagnosis of a carrier, which is expensive and not widely available. Thus follows the importance of building predictive models that can act as decision-support tools, because they are easy to deploy and use in low-resource settings where other options are limited.

Research Questions

Variables of Interest

  • This dataset contains 16 total variables, the following were considered in this project:
    • hb, Hemoglobin concentration in grams per decilitre - g/dL
    • rbc, Red blood cell volume in 10^12/L
    • lymph, Percentage of white blood cells that are lymphocytes
    • neut, Percentage of white blood cells that are neutrophils
    • plt, Total platelet count in 10^6/L
    • phenotype, Phenotype of the patient, either Silent Carrier or Alpha Trait

Source & Cleaning

I obtained my data for this project from Kaggle.

  • About the dataset
    • This dataset is from a database of 288 cases from the Human Genetics Unit (HGU) of the Faculty of Medicine, Colombo, Sri Lanka.
    • The data used in this project (n=147) was collected from Alpha thalassemia carrier children and their family members screened, from 2016 to 2020.
  • Data Cleaning
    • There is one missing value present in this data set. It was missing for the variable mch, mean corpuscular hemoglobin. However, I am not using this variable in my model so this is not a concern.
    • I next converted the categorical variables sex and phenotype into factors as described below, such that they can be used in my logistic regression approach.
      • Sex, where Male = 0 and Female = 1
      • Phenotype, where Silent Carrier = 0 and Alpha Trait = 1
    • I finally checked the distributions of all the variables for outliers, and while there were some, they were not out of the realm of biological possibility and all 288 observations in the original data thus were included in this study.

Column

Data Table

Data Cleaning Intro

Data Cleaning Histogram

Methods and EDA

Column

EDA Analysis

  • going to discuss changes in median/spread of data between the silent carrier and alpha trait phenotype for each variable. Maybe exclude some that I don’t consider in my model?

Methods

For this project, I am going to employ a binary logistic regression approach.

  • Logistic regression is a method used to predict the probability of a discrete outcome of two mutually exclusive events.

    • In this case, predicting the probability of a peron being either a silent carrier or possessing the alpha trait phenotype.
  • Logistic regression analyzes the relationship between the target and predictor variables by utilizing a logistic function to model the probability of an event occurring, rather than a continuous value as seen in linear regression.

  • To complete my logistic regression approach, I utilize several R packages such as: caret, nnet, pROC, and pscl.

Column

Phenotype

Hb

Pcv

rbc

lymph

plt

mch

mchc

rdw

Model Performance

Column

Set up

  • In order to do logistic regression, the target variable needs to partitioned into two groups.
      1. Training data, used to estimate model parameters.
      1. Test data, to assess how well the model works on new, unseen data.
  • 70% of the data was used for training data, and 30% was reserved for test data.
Training data:

silent_carrier    alpha_trait 
            74             30 
Testing data:

silent_carrier    alpha_trait 
            31             12 

Model

For this model, I did phenotype ~ hb + rbc + lymph

This model was selected after testing several full and reduced models, as it ultimately had the best performance on key logistic regression outputs.

A binary logistic regression model was fitted predicting the phenotype from hemoglobin concentration hb, red blood cell volume rbc and percent of lymphocytes in white blood cell count lymph.


Call:
glm(formula = formula_logit, family = binomial, data = train)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  3.57589    2.24398   1.594 0.111038    
hb          -0.69551    0.20504  -3.392 0.000694 ***
rbc          0.95577    0.46345   2.062 0.039182 *  
lymph       -0.03237    0.01848  -1.752 0.079794 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 124.96  on 103  degrees of freedom
Residual deviance: 109.47  on 100  degrees of freedom
AIC: 117.47

Number of Fisher Scoring iterations: 4

Column

Goodness of fit

Analysis of Deviance Table

Model 1: phenotype ~ 1
Model 2: phenotype ~ hb + rbc + lymph
  Resid. Df Resid. Dev Df Deviance Pr(>Chi)   
1       103     124.96                        
2       100     109.47  3   15.487 0.001444 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The likelihood ratio test (LRT) compares the logistic regression model to a null model containing only an intercept. The residual deviance decreases from 124.96 to 109.47 when the predictors are added, producing a test statistic of 15.487 on 3 degrees of freedom (p<0.01 ). This small p-value indicates that the predictors significantly improve model fit compared to the null model. The included clinical variables provide substantial explanatory power for predicting the phenotype of Alpha Thalassemia.

fitting null model for pseudo-r2
        llh     llhNull          G2    McFadden        r2ML        r2CU 
-54.7364213 -62.4799152  15.4869877   0.1239357   0.1383562   0.1978586 

Pseudo-R2 values provide additional measures of model fit for logistic regression. McFadden’s pseudo-R2 was 0.1239, which indicates moderate fit. The maximum-likelihood R2 (r2ML = 0.1383, Cox & Snell) similarly suggests improvement over the null model. Nagelkerke’s pseudo-R2 was 0.1979, meaning the model achieves about 19.79% of the maximum possible improvement in fit, relative to the null model. Overall, these values indicate that the logistic regression model provides a moderate fit to the data that is better for predicting phenotype than the null model.

Key effects

(Intercept)          hb         rbc       lymph 
 35.7263857   0.4988216   2.6006603   0.9681505 
  • For each additional increase in g/dL of hemoglobin concentration, the odds of the patient presenting the alpha trait phenotype of Alpha thalassemia decrease by about 50.12%, holding all other variables constant. Here, 50.11% is from (0.4988−1=0.5012=50.12%).

  • For each 1x10^12 cells/L increase in red blood cells, the odds of the patient presenting the alpha trait phenotype of Alpha thalassemia more than double (about 2.6 times), holding all other predictors constant.

  • For each additional 1% of the white blood cells population that is lymphocytes, the odds of the patient presenting the alpha trait phenotype of Alpha thalassemia decrease by about 3.18%, holding all other variables constant. Here, 3.18% is from (0.9682-1=0.0318=3.18%).

CM

Confusion Matrix and Statistics

                Reference
Prediction       silent_carrier alpha_trait
  silent_carrier             29           6
  alpha_trait                 2           6
                                         
               Accuracy : 0.814          
                 95% CI : (0.666, 0.9161)
    No Information Rate : 0.7209         
    P-Value [Acc > NIR] : 0.1144         
                                         
                  Kappa : 0.485          
                                         
 Mcnemar's Test P-Value : 0.2888         
                                         
            Sensitivity : 0.5000         
            Specificity : 0.9355         
         Pos Pred Value : 0.7500         
         Neg Pred Value : 0.8286         
             Prevalence : 0.2791         
         Detection Rate : 0.1395         
   Detection Prevalence : 0.1860         
      Balanced Accuracy : 0.7177         
                                         
       'Positive' Class : alpha_trait    
                                         

Using a 0.5 probability cutoff on the test data, the model achieved an accuracy of 81.4%, with 50% sensitivity and 93.55% specificity.

ROC/AUC

The ROC curve yielded an AUC of 0.878, indicating excellent discrimination between patients with the silent carrier phenotype and those with the alpha trait phenotype.

Conclusion

This logistic regression model demonstrates that a small set of clinical predictors provides explanatory and predictive power for assessing phenotype of Alpha Thalassemia patients, while maintaining easy interpretability for use in clinical screenings and application in disease prevention.

Discussion & Limitations

Column

Conclusions

Limitations

  • small sample size

Future Directions

  • apply this to global demographic
  • develop approaches for additional phenotypes of Alpha thalassemia

About the Author

Column

Background

My name is Ashley Williams and I am an undergraduate student attending the University of Dayton. I am majoring in Biology and I am minoring in Chemistry, Data Analytics, Neuroscience, and Research in the Biological Sciences. My anticipated graduation is in May of 2027.

I am an undergraduate researcher and have co-authorship of two peer-reviewed scientific papers, one from 2022 and one from this year! I conduct my research in the Williams Lab, where I specifically study the regulation of the Drosophila melanogaster pale gene, and its origin during the evolution of a dimorphic pigmentation trait. I have been heavily involved in scientific research since 2021, and I have also presented my research on numerous occasions including twice at the University of Dayton’s Stander Symposium, at the Society for Developmental Biology’s 83rd Annual Meeting, and at the American Society for Biochemistry and Molecular Biology’s conference, “Evolution and core processes in gene regulation”.

I am interested in pursuing a Ph.D. in the field of genetics after my graduation, and continuing my career in academia and biological research.

Column

Presenting

---
title: "AT Analysis"
output: 
  flexdashboard::flex_dashboard:
    theme:
      version: 4
      bootswatch: default
      navbar-bg: "#b53533"
    orientation: columns
    vertical_layout: fill
    source_code: embed
---

```{r setup, include=FALSE}
library(flexdashboard)
pacman::p_load(caret, nnet, pROC, pscl, tidyverse, DT)
data <- read.csv("twoalphas.csv")

data$phenotype <- ifelse(data$phenotype == "alpha trait", 1, 0)
#alpha trait = 1
#female = 1

data$sex <- ifelse(data$sex == "female", 1, 0)

data2 <- data %>%
  mutate(
    phenotype = factor(phenotype, levels = c(0, 1),
                    labels = c("silent_carrier", "alpha_trait")),
    sex    = factor(sex, levels = c(0, 1),
                    labels = c("Male", "Female")))
```

Title
===

Column {data-width=450}
---

### <b><span Style="color:#4f0c0b">Title</span></b>

<font size=8><b><span Style="color:#b53533">Predicting Alpha Thalassemia Phenotype using Clinical Measures</span></b></font>
  
  <font size=6><b><span Style="color:#d15c5a">Ashley Williams</span></b></font>

Column {data-width=550}
---

### <b><span Style="color:#4f0c0b">Abstract</span></b>

Alpha thalassemia is a prevalent genetic disorder with a wide spectrum of clinical severity, ranging from asymptomatic carrier states to presenting fatal severe anemia. Genotype-phenotype correlations are generally strong but variable, necessitating reliable prognostic tools for clinical management and improved disease prevention. This project aimed to develop and validate a logistic regression model to predict the clinical phenotype (either silent carrier or alpha trait status) of alpha thalassemia patients using clinical variables.

Results indicate that a combination of hemoglobin concentration, red blood cell count, and lymphocyte percentage are strong predictors of phenotypic severity. The final model achieved predictive accuracy and allows for easy interpretation, demonstrating its potential use as a clinical decision support tool. The resulting model provides a cost-effective method to aid in alpha thalassemia assessment and prevention with particular application in low-resource settings where advanced testing is inaccessible.

Background
===

Column {.tabset data-width=500}
-----------------------------------------------------------------------

### <font size=2.8><span Style="color:#4f0c0b">Background</span></font>

<span Style="color:#b53533">Alpha thalassemia is an inherited blood disorder causing the body to produce an insufficient amount of hemoglobin, thus leading to anemia</span>

  - <span Style="color:#d15c5a">Alpha thalassemia occurs when 1 or more of the 4 total alpha-globin genes (2 inherited from each parent), which contribute to the synthesis of hemoglobin molecules, are mutated or deleted.</span>
  - <span Style="color:#b53533">There are multiple types of alpha thalassemia with a range of severities. In this project, I focus on the following:
    - <b>Alpha thalassemia silent carrier:</b> One alpha-globin gene is affected, the other 3 are wildtype. Blood tests are often normal, but their red blood cells may be smaller than normal. Being a silent carrier means you don’t have signs of the disease, but you can pass the damaged gene on to progeny. This is confirmed by DNA tests.
    - <b>Alpha thalassemia trait carrier:</b> Two genes are affected. Patient likely to have mild anemia.</span>
  - <span Style="color:#d15c5a">Having 3 affected genes leads to Hemoglobin H disease, where the patient has moderate to severe anemia. Having all 4 affected genes causes severe anemia, where most cases lead to prenatal death.</span>
  - <span Style="color:#b53533">There is no cure for Alpha thalassemia. Thus, effective screening to detect Thalassemia carriers is vital to prevention. There are many challenges to an effective screening program, especially in low-resource settings. Considering alpha-thalassemia, genetic testing is needed for a confirmatory diagnosis of a carrier, which is expensive and not widely available. Thus follows the importance of building predictive models that can act as decision-support tools, because they are easy to deploy and use in low-resource settings where other options are limited.<span>
  
### <font size=2.8><span Style="color:#4f0c0b">Research Questions</span></font>

### <font size=2.8><span Style="color:#4f0c0b">Variables of Interest</span></font>

- <span Style="color:#b53533">This dataset contains 16 total variables, the following were considered in this project:</span>
  - `hb`, Hemoglobin concentration in grams per decilitre - g/dL
  - `rbc`, Red blood cell volume in 10^12/L
  - `lymph`, Percentage of white blood cells that are lymphocytes
  - `neut`, Percentage of white blood cells that are neutrophils
  - `plt`, Total platelet count in 10^6/L
  - `phenotype`, Phenotype of the patient, either Silent Carrier or Alpha Trait

### <font size=2.8><span Style="color:#4f0c0b">Source & Cleaning</span></font>

<span Style="color:#b53533">I obtained my data for this project from [Kaggle](https://www.kaggle.com/datasets/letslive/alpha-thalassemia-dataset?select=twoalphas.csv).</span>

- <span Style="color:#d15c5a">About the dataset</span>
  - <span Style="color:#b53533">This dataset is from a database of 288 cases from the Human Genetics Unit (HGU) of the Faculty of Medicine, Colombo, Sri Lanka.</span> 
  - <span Style="color:#d15c5a">The data used in this project (n=147) was collected from Alpha thalassemia carrier children and their family members screened, from 2016 to 2020.</span>
 
- <span Style="color:#b53533">Data Cleaning</span>
  - <span Style="color:#d15c5a">There is one missing value present in this data set. It was missing for the variable `mch`, mean corpuscular hemoglobin. However, I am not using this variable in my model so this is not a concern.</span>
  - <span Style="color:#b53533">I next converted the categorical variables `sex` and `phenotype` into factors as described below, such that they can be used in my logistic regression approach. 
    - <span Style="color:#d15c5a">Sex, where <b>Male = 0</b> and <b>Female = 1</b>
    - Phenotype, where <b>Silent Carrier = 0</b> and <b>Alpha Trait = 1</b></span>
  - <span Style="color:#d15c5a">I finally checked the distributions of all the variables for outliers, and while there were some, they were not out of the realm of biological possibility and all 288 observations in the original data thus were included in this study.</span>

Column {.tabset data-width=500}
-----------------------------------------------------------------------

### <span Style="color:#4f0c0b">Data Table</span>
  
```{r}
datatable(data[1:50,], rownames=FALSE)
```

### <span Style="color:#4f0c0b">Data Cleaning Intro</span>
```{r}
library(DataExplorer)
plot_intro(data)
```

### <span Style="color:#4f0c0b">Data Cleaning Histogram</span>
```{r}
plot_histogram(data)
```


Methods and EDA
===

Column {.tabset data-width=500}
---

### EDA Analysis

- going to discuss changes in median/spread of data between the silent carrier and alpha trait phenotype for each variable. Maybe exclude some that I don't consider in my model?

### Methods

For this project, I am going to employ a binary logistic regression approach.

- Logistic regression is a method used to predict the probability of a discrete outcome of two mutually exclusive events. 
  - In this case, predicting the probability of a peron being either a silent carrier or possessing the alpha trait phenotype. 
- Logistic regression analyzes the relationship between the target and predictor variables by utilizing a logistic function to model the probability of an event occurring, rather than a continuous value as seen in linear regression.

- To complete my logistic regression approach, I utilize several R packages such as: caret, nnet, pROC, and pscl.

Column {.tabset data-width=500}
---

### Phenotype

```{r}
ggplot(data2, aes(x=phenotype))+geom_bar(fill="#d15c5a", color="black")+labs(title="Distribution of Phenotype", x="Phenotype", y="Count") + geom_text(aes(x="alpha_trait", y=47, label="42"))+geom_text(aes(x="silent_carrier", y=110, label="105"))
```

### Hb

```{r}
ggplot(data2, aes(x=phenotype, y=hb))+geom_boxplot(fill="#d15c5a")+labs(title="Distribution of Hemoglobin concentration by phenotype", x="phenotype", y="hb (g/dL)")
```

### Pcv

```{r}
ggplot(data2, aes(x=phenotype, y=pcv))+geom_boxplot(fill="#d15c5a")+labs(title="Distribution of PCV/hematocrit % by phenotype", x="phenotype", y="pcv/hematocrit %")
```


### rbc

```{r}
ggplot(data2, aes(x=phenotype, y=rbc))+geom_boxplot(fill="#d15c5a")+labs(title="Distribution of RBC Volume by phenotype", x="phenotype", y="rbc (10^12/L)")
```


### lymph

```{r}
ggplot(data2, aes(x=phenotype, y=lymph))+geom_boxplot(fill="#d15c5a")+labs(title="Distribution of lymph by phenotyple", x="phenotyple", y="lymph")
```

### plt

```{r}
ggplot(data2, aes(x=phenotype, y=plt))+geom_boxplot(fill="#d15c5a")+labs(title="Distribution of plt by phenotyple", x="phenotyple", y="plt")
```


### mch

```{r}
ggplot(data2, aes(x=phenotype, y=mch))+geom_boxplot(fill="#d15c5a")+labs(title="Distribution of mch by phenotyple", x="phenotyple", y="mch")
```

### mchc
```{r}

```


### rdw

```{r}
ggplot(data2, aes(x=phenotype, y=rdw))+geom_boxplot(fill="#d15c5a")+labs(title="Distribution of rdw by phenotyple", x="phenotyple", y="rdw")
```

Model Performance
===

Column {.tabset data-width=500}
---

### Set up

- In order to do logistic regression, the target variable needs to partitioned into two groups. 
  - (1) Training data, used to estimate model parameters.
  - (2) Test data, to assess how well the model works on new, unseen data.
  
- 70% of the data was used for training data, and 30% was reserved for test data.
  
```{r}
library(caret)
set.seed(11)
idx <- createDataPartition(data2$phenotype, p = 0.7, list = FALSE)
train <- data2[idx, ]
test  <- data2[-idx, ]
```

Training data:
```{r}
table(train$phenotype)
```

Testing data:
```{r}
table(test$phenotype)
```

### Model

For this model, I did `phenotype` ~ `hb` + `rbc` + `lymph`

This model was selected after testing several full and reduced models, as it ultimately had the best performance on key logistic regression outputs.

A binary logistic regression model was fitted predicting the `phenotype` from hemoglobin concentration `hb`, red blood cell volume `rbc` and percent of lymphocytes in white blood cell count `lymph`.


```{r}
formula_logit <- phenotype ~ hb + rbc + lymph
logit_model <- glm(formula_logit, data = train, family = binomial)
summary(logit_model)
```

Column {.tabset data-width=500}
---

### Goodness of fit

```{r}
null_model <- glm(phenotype ~ 1, data = train, family = binomial)

anova(null_model, logit_model, test = "Chisq")
```

The likelihood ratio test (LRT) compares the logistic regression model to a null model containing only an intercept. The residual deviance decreases from 124.96 to 109.47 when the predictors are added, producing a test statistic of 15.487 on 3 degrees of freedom (p<0.01
). This small p-value indicates that the predictors significantly improve model fit compared to the null model. The included clinical variables provide substantial explanatory power for predicting the phenotype of Alpha Thalassemia.

```{r}
pR2(logit_model)
```

Pseudo-R2 values provide additional measures of model fit for logistic regression. 
McFadden’s pseudo-R2 was 0.1239, which indicates moderate fit. 
The maximum-likelihood R2 (r2ML = 0.1383, Cox & Snell) similarly suggests improvement over the null model. 
Nagelkerke’s pseudo-R2 was 0.1979, meaning the model achieves about 19.79% of the maximum possible improvement in fit, relative to the null model. Overall, these values indicate that the logistic regression model provides a moderate fit to the data that is better for predicting phenotype than the null model.

### Key effects

```{r}
or <- exp(coef(logit_model))
or
```

- For each additional increase in g/dL of hemoglobin concentration, the odds of the patient presenting the alpha trait phenotype of Alpha thalassemia decrease by about 50.12%, holding all other variables constant. Here, 50.11% is from (0.4988−1=0.5012=50.12%).

- For each 1x10^12 cells/L increase in red blood cells, the odds of the patient presenting the alpha trait phenotype of Alpha thalassemia more than double (about 2.6 times), holding all other predictors constant.

- For each additional 1% of the white blood cells population that is lymphocytes, the odds of the patient presenting the alpha trait phenotype of Alpha thalassemia decrease by about 3.18%, holding all other variables constant. Here, 3.18% is from (0.9682-1=0.0318=3.18%).

### CM

```{r}
test_prob <- predict(logit_model, newdata = test, type = "response")

test_pred <- ifelse(test_prob >= 0.5, "alpha_trait", "silent_carrier") %>%
  factor(levels = levels(test$phenotype))

cm <- confusionMatrix(test_pred, test$phenotype, positive = "alpha_trait")
cm
```

Using a 0.5 probability cutoff on the test data, the model achieved an accuracy of 81.4%, with 50% sensitivity and 93.55% specificity.

### ROC/AUC

```{r}
roc_obj <- roc(response = test$phenotype,
               predictor = test_prob,
               levels = c("silent_carrier", "alpha_trait"),
               direction = "<")

plot(roc_obj,
     print.auc = TRUE,
     legacy.axes = TRUE,
     main = "ROC Curve for Alpha Thalassemia Model")
```

The ROC curve yielded an AUC of 0.878, indicating excellent discrimination between patients with the silent carrier phenotype and those with the alpha trait phenotype.

### Conclusion

This logistic regression model demonstrates that a small set of clinical predictors provides explanatory and predictive power for assessing phenotype of Alpha Thalassemia patients, while maintaining easy interpretability for use in clinical screenings and application in disease prevention.

Discussion & Limitations
===

Column {.tabset data-width=1000}
---

### Conclusions

### Limitations

- small sample size

### Future Directions

- apply this to global demographic
- develop approaches for additional phenotypes of Alpha thalassemia

About the Author
===

Column {data-width=500}
---

### Background

My name is Ashley Williams and I am an undergraduate student attending the University of Dayton. I am majoring in Biology and I am minoring in Chemistry, Data Analytics, Neuroscience, and Research in the Biological Sciences. My anticipated graduation is in May of 2027.

I am an undergraduate researcher and have co-authorship of two peer-reviewed scientific papers, one from [2022](https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1010653) and one from [this year](https://academic.oup.com/mbe/article/42/9/msaf213/8248050)! I conduct my research in the [Williams Lab](https://thetomwilliamslab.com/), where I specifically study the regulation of the <i>Drosophila melanogaster pale</i> gene, and its origin during the evolution of a dimorphic pigmentation trait. I have been heavily involved in scientific research since 2021, and I have also presented my research on numerous occasions including twice at the University of Dayton's <span Style="color:#cf311e">Stander Symposium</span>, at <span Style="color:#267c28">the Society for Developmental Biology's 83rd Annual Meeting</span>, and at <span Style="color:#0066b6">the American Society for Biochemistry and Molecular Biology's conference, "Evolution and core processes in gene regulation"</span>.

I am interested in pursuing a Ph.D. in the field of genetics after my graduation, and continuing my career in academia and biological research.

Column {data-width=500}
---
  
### Presenting

```{r, fig.width=6, echo=FALSE, fig.align='right'}
knitr::include_graphics("IMG_6751.jpeg")
```